-
Notifications
You must be signed in to change notification settings - Fork 15
[crashtracking] add doc to outline UNIX pipe communication #1239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
BenchmarksComparisonBenchmark execution time: 2025-09-25 17:31:04 Comparing candidate commit e6d47ef in PR branch Found 15 performance improvements and 2 performance regressions! Performance is the same for 36 metrics, 2 unstable metrics. scenario:concentrator/add_spans_to_concentrator
scenario:credit_card/is_card_number/37828224631000521389798
scenario:credit_card/is_card_number/x371413321323331
scenario:credit_card/is_card_number_no_luhn/ 3782-8224-6310-005
scenario:credit_card/is_card_number_no_luhn/ 378282246310005
scenario:credit_card/is_card_number_no_luhn/378282246310005
scenario:credit_card/is_card_number_no_luhn/37828224631000521389798
scenario:credit_card/is_card_number_no_luhn/x371413321323331
scenario:normalization/normalize_name/normalize_name/good
CandidateCandidate benchmark detailsGroup 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
Group 13
Group 14
Group 15
BaselineOmitted due to size. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1239 +/- ##
==========================================
- Coverage 71.69% 71.63% -0.06%
==========================================
Files 354 355 +1
Lines 56138 56316 +178
==========================================
+ Hits 40247 40343 +96
- Misses 15891 15973 +82
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to have such doc.
I do not want to be picky, but copying code (or having filename + line numbers) into the doc can be burden if tomorrow we change the code (and the code will change). We might forget to change it.
|
||
**File**: `datadog-crashtracker/src/receiver/entry_points.rs:97-119` | ||
|
||
```rust |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need to add the code: if tomorrow the code changes, it means we have to change the doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming it's not too complicated, you can put it all in rustdoc instead of .MD and have the code compile.
4980849
to
e6d47ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. I'm part way through the review, but one overall comment so far: a bunch of this seems fairly repetitive.
//! ┌─────────────────────┐ ┌──────────────────────┐ | ||
//! │ Signal Handler │ │ Collector Process │ | ||
//! │ (Original Process) │ │ (Forked Child) │ | ||
//! │ │ │ │ | ||
//! │ 1. Catch crash │────fork()──────────►│ 2. Setup stdio │ | ||
//! │ 2. Fork collector │ │ 3. Create UnixStream │ | ||
//! │ 3. Wait for child │ │ 4. Write crash data │ | ||
//! │ │◄────wait()──────────│ 5. Exit cleanly │ | ||
//! └─────────────────────┘ └──────────────────────┘ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you line up the fork and wait arrows with the lines of code the represent?
/// ┌─────────────────────────────┐ ┌─────────────────────────────┐ | ||
/// │ 1. Catches crash signal │ │ 4. Closes stdio (0,1,2) │ | ||
/// │ 2. Forks collector process │──►│ 5. Disables SIGPIPE │ | ||
/// │ 3. Returns to caller │ │ 6. Creates UnixStream │ | ||
/// │ │ │ 7. Calls emit_crashreport() │ | ||
/// │ │ │ 8. Exits with _exit(0) │ | ||
/// └─────────────────────────────┘ └─────────────────────────────┘ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interleaving of steps here is a little confusing. Can we have whitespace to indicate which steps are concurrent?
}; | ||
|
||
// Emit crashreport | ||
// Create Unix socket stream for crash data transmission |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch
//! 1. **Section Delimiters**: Uses constants from [`crate::shared::constants`] to mark boundaries | ||
//! 2. **Structured Data**: Writes JSON, text, or binary data within sections | ||
//! 3. **Immediate Flushing**: Flushes each section to ensure data integrity | ||
//! 4. **Completion Marker**: Ends transmission with `DD_CRASHTRACK_DONE` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this a little confusing, because the delimiters wrap the structured data, but the list here makes them seem like they only come before.
//! | ||
//! ### Key Sections | ||
//! | ||
//! - **Stack Trace** (`emit_backtrace_by_frames`): Stack frames with optional symbol resolution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still resolve symbols in process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can pass a configuration to allow this.
/// | ||
/// ## Section Emission Order | ||
/// | ||
/// The crash report is written in this specific order: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we making this contractural? I'd say this is an implementation detail. The key part is that we try to emit more critical bits of info first, and bits that are more likely to crash last. Bits that are both critical and likely to crash are ¯\_(ツ)_/¯
//! ┌─────────────────┐ socketpair() ┌─────────────────┐ | ||
//! │ Collector │◄───────────────────►│ Receiver │ | ||
//! │ (Crashing proc) │ │ (Fork+execve) │ | ||
//! │ │ Write End │ │ | ||
//! │ uds_parent ────────────────────────────────► stdin (fd=0) │ | ||
//! └─────────────────┘ └─────────────────┘ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix formatting
/// This function never returns - it either successfully executes the receiver binary | ||
/// or terminates the process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this confusing
What does this PR do?
This PR adds a document in the
docs/
folder of this repository to outline how we currently communicate over a UNIX socket to extract crash data in our two process,Collector
andReceiver
, architecture in the CrashtrackerMotivation
Quality of life and DevX for developers wanting to understand Crashtracker architecture
Additional Notes
Note to reviewers: Pressing the

Display the rich diff
button renders the Markdown, potentially making reading and reviewing easierHow to test the change?
Describe here in detail how the change can be validated.
Ticket: https://datadoghq.atlassian.net/browse/PROF-12520